Operating System Support for Database Management

Stonebraker

Some OS policies are not good for databases. Paper argues that many OS policies are inflexible and should be able to take “advice” from applications like DBs.

Buffer Pool Management

* fetching a block requires system call + memory copy – thousands of instructions. too long for a buffer pool miss.

* Most OSes used LRU. Most DBs know exactly their access patterns, and most of the time LRU is worst possible strategy. DBs, for example, may access blocks sequentially that will not be referenced again (or cyclically referenced), or do random access w/o rereference. Very rarely will they do random access with reference (which LRU might do OK on)

* OSes attempt to do sequential prefetching. DBs always know what their access patterns will be, and what to prefetch next, but OSes will always (and only) sequentially prefetch. (inflexibly policy)

* To do write-ahead logging, DB needs to be able to tell the OS to flush log entires to disk in the appropriate order.

=> DBMSes end up doing their own buffer pool management in user space

File System

* character sequence files are not the right abstraction for databases. character sequence files are expanded one disk block at a time, and result in being scattered around disk. since DBs do considerable sequential access, they could benefit from an extent-based system and record-based filesystems

* stonebraker argues that character sequence files can be implemented on top of extent-, record-based systems but things can’t be done efficiently the other way around

* also argues that it is a bit ridiculous to have three different trees (directory trees, i-node “trees” for files, and DBMS indicies) on the same system instead of having one consolidated tree structure

Scheduling, Process Mgmt, IPC

Two-possible architectures: (A1) one DB process for every user, (A2) one DB server process that services the requests of many user processes

Arch A1 Disadvantages:

* DB process is put to sleep on every buffer pool miss

* Priority inversion. DB processes that are put to sleep can be in critical sections (hold locks) that other DB processes may need.
Arch A2 Disadvantages:
* Server must do it own scheduling between requests (duplication of scheduling functionality in OS)
* hmmm... I’m not quite sure I really understand or buy what stonebraker has against the server process model

Consistency Control

DBs need locking at a finer granularity than at the file level. DBs would like page or record level locking. OSes typically only provide file level locking.
???

Paged Virtual Memory

OS folks have suggested that the right abstraction to deal with files are to map them into memory, but there are problems:
Some DB files are too big to fit in memory. Even the page table might be to big to fit in memory.

Conclusion: Current DBs don’t make use of OS provided services because they are inappropriate or slow. Future OSes should not try to be all things to all people, and should be more sensitive to the needs of specific applications like DBs.